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<N . ABSTRACT 



We describe a new technique for heterodyne spectroscopy, which we call Least- 
Squares Frequency Switching, or LSFS. This technique avoids the need for a tradi- 
tional reference spectrum, which — when combined with the on-source spectrum — 
^ ^1 introduces both noise and systematic artifacts such as "basehne wiggles". In 

contrast, LSFS derives the spectrum directly, and in addition the instrumental 
gain profile. The resulting spectrum retains nearly the full theoretical sensitivity 
c§ I and introduces no systematic artifacts. 

Here we discuss mathematical details of the technique and use numerical ex- 
^ _ periments to explore optimum observing schemas. We outline a modification 

Q> I suitable for computationally difficult cases as the number of spectral channels 



O 



grows beyond several thousand. We illustrate the method with three real-life 



CN I examples. In one of practical interest, we created a large contiguous bandwidth 

I aligning three smaller bandwidths end-to-end; radio astronomers are often faced 

I with the need for a larger contiguous bandwidth than is provided with the avail- 

^ ' able correlator. 
> 

X 

: 1. INTRODUCTION 

In digital heterodyne spectroscopy, the measured spectrum is the product of the radio- 
frequency (RF) power and the intermediate-frequency (IF) gain spectra; to obtain the RF 
power spectrum, one must divide the on-source measured spectrum (the ON spectrum) by 
the IF gain spectrum. This is usually accomplished by dividing by a reference spectrum 
(the OFF spectrum), which is obtained by moving off in frequency or position. However, 
using such OFF spectra introduces additional noise because some observing time is spent 
off-source, and also introduces additional artifacts ("baseline wiggles"). 

In particular, obtaining accurate spectral profiles for Galactic HI is difficult because 
of the difficulty in obtaining a good OFF spectrum. There is no place in the sky where 
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Galactic HI does not exist, so one cannot use an OFF position. Instead, the OFF spectrum 
is commonly obtained by taking an off-frequency spectrum, moving far enough in frequency 
so that the HI line is zero. This technique is known as "frequency switching". 

At many telescopes, frequency switching produces inaccurate OFF spectra. This occurs 
because the RF gain and/or the RF power have frequency structure. One contributor is 
reflections on the telescope structure, for example between the feed and the reflector. If 
their separation distance is D, then the reflected signal returns with a time delay r = 
This produces a peak in the autocorrelation function of the received signal with delay r, 
which in turn produces a sinusoidal ripple in the frequency spectrum with period fr = ^■ 
At Arecibo, fr ~ 1.0 MHz, equivalent to about 200 km s~^, which is comparable to the 
velocity ranges of interest for many HI studies. Similarly, mm-wave telescopes used for 
molecular emission are much smaller than Arecibo and the line frequencies are much larger, 
and again the ripple is comparable to interesting line widths. Telescopes typically have many 
reflecting paths with different delays, so the received signal has a superposition of ripples 
with somewhat different periods. These ripples cannot be removed by frequency switching, 
and in fact are sometimes amplified by an unfortunate choice for the frequency-switching 
interval. 

Here we describe a new approach. Instead of switching the local oscillator (LO) fre- 
quency and hoping for good cancellation of the ripple, we set the LO frequency to a number 
of different values so that we can evaluate the RF power spectrum and the IF gain spec- 
trum as distinct entities using a least-squares technique; we call this Least-Squares Frequency 
Switching, or LSFS. We begin in ^by reviewing the conventional switching techniques; these 
introduce extra noise and baseline artifacts, both of which are reduced or eliminated by LSFS. 

The rest of the paper is devoted to LSFS. §3] describes the basics of the technique. §1] 
illustrates our first observational attempt, in which we created a large contiguous bandwidth 
by aligning three smaller bandwidths end-to-end. The method relies on choosing sensible 
LO frequencies, and with unwise choices the least-squares matrices can be degenerate; ^ 
discusses this problem and its solution using Singular Value Decomposition. ^ presents 
several schemas for choosing LO frequencies, and §H] presents the results of numerical exper- 
iments that evaluate the quality of these schemas. Up to this point, all of the discussion is 
directed towards total power in a single polarization, or alternatively Stokes /; ^ITO] shows how 
the technique applies to the polarized Stokes parameters {Q,U,V). Finally, ^HTl compares 
switching with LSFS and ^IT2]is a summary. 
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2. REVIEW OF FUNDAMENTALS: POSITION AND FREQUENCY 

SWITCHING 

In heterodyne spectroscopy, we convert the radio-frequency (RF) spectrum to a low 
intermediate frequency (IF). This conversion is done by multiplying RF power by the local 
oscillator (LO) in the mixer. Symbolically, denoting frequency by /, we multiply Jrf by 
fLO- This multiplication generates the sum and difference frequencies (/rf + fio) and 
fiF — {Irf — fho)] we remove the sum with a suitable low-pass filter, leaving the desired 
near-baseband fip- 

The mixer is a transition point between RF and IF frequencies. We can meaningfully 
discuss the RF and IF sections as separate entities. Thus, the RF section receives from the 
sky the antenna temperature ^^(/ijF) and also has the receiver contribution Tji{fjip), which 
is often much larger. These are multiplied by the RF transfer function, known as the gain 
G'rf(/rf)- Most of the frequency dependence of (^^^(/kf) occurs in the feed and electronics, 
which operates on both T4 and T^i, at least to a first approximation. However, often the 
antenna temperature suffers an additional frequency-dependent gain, which occurs because 
the incoming power reflects from various portions of the telescope structure and interferes 
with itself. For simplicity, we neglect this difference and assume that the RF gain affects Ta 
and Tr equally. Thus, symbohcally, the RF power into the mixer SupifRp) is equal to 



The IF section has a transfer function, or gain, GipifiF)- A well-designed system has no 
additional power contributed at IF. The spectral power measured by the digital spectrometer 
is provided as a function of the IF frequency, so the appropriate symbol is -P/f(//f), which 
is given by 



The relationship between the spectral channels and IF frequency is fixed. We access different 
portions of the RF spectrum by changing the LO frequency. 

We now break the TAifRp) and TR{fRp) into frequency-independent ("continuum") 
and frequency-dependent ("spectral") portions to simplify further development. With these 
decompositions, the measured spectral power P/f(//f) depends on the following quantities: 

1- TAifRp)- The subscript A means "antenna temperature", so this is the spectral hue 
contribution from the sky. The explicit presence of the dependence (/rf) means that 



SRFifRp) — GjiFifRp) [TAifRp) + Tji{ffip)] . 



(1) 



Pip{fip) — Gipifip) SuplfRp) . 



(2) 
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there is a spectral dependence, as befits a spectral line or the usually slowly-varying 
continuum radiation. 

2. Ta, the frequency- independent portion of the antenna-temperature continuum con- 
tribution. Continuum radiation is weakly dependent on frequency; the frequency- 
dependent portion is incorporated into Ta{RF). 

3. Tfi{ffiF)- The subscript R means "receiver temperature", and includes all non-antenna 
contributions. As with TAifRp), the explicit dependence on {/rf) denotes only the 
frequency-dependent portion. For many systems, spectral variations in Tji{fjiF) are 
fractionally small. 

4. Tfj, the receiver-temperature continuum (frequency- independent) contribution. 

5. GrfUrf), the RF gain (dependent only on RF frequency). For many systems, GRp^fRF) 
varies slowly with frequency. 

6. GiFifip), the IF gain (dependent on IF frequency). With digital spectroscopy, we 
must limit the bandwidth by an appropriate IF bandpass filter. This means that Gjf 
varies severely across the band, varying from to full gain. 

With these definitions equation [1] becomes a bit more elaborate. When we look at a 
source in the sky we measure the on-source (ON) spectrum 

PifiF) = GifUif) GrfUrf) \{TaUrf) + Ta) + (TrUrf) + Tr)] . (3) 

Our goal is to disentangle the sky contribution from everything else, i.e. to obtain {TA^fRF) + 
Ta) ■ Being primarily interested in spectroscopy, a modified goal is to obtain only the spectral 
portion Ta^rf)- 

We cannot do either of these without dealing with the two gains and the contributions 
from Tr. It is traditional to deal with these extraneous quantities by taking a reference spec- 
trum, usually denoted the off-source (OFF) spectrum, and arithmetically combining it with 
the ON spectrum by taking [ ^^J^/^ ). This process is commonly known as "switching" . It 
works well if the frequency dependencies of the extraneous quantities are benign, but this is 
not always the case. Let us examine the results of this switching process. 

Below we consider two ways of obtaining the OFF, one by moving off in position and 
one by moving off in frequency. Let primed quantities indicate the OFF measurements and 
unprimed the ON. Further, let us simplify the problem by assuming the spectral dependence 
of the receiver temperature to be small, i.e. 
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TrUrf) « Tr . (4) 

This allows us to make a Taylor expansions of the expression (^ ^'^^^^^ ^ for the two types of 
switching by dropping terms higher than first order. (We make these expansions to minimize 
complication; if higher-order terms are included, the expressions become more complicated 
but the techniques can be still applied). 



2.1. Position Switching 



When the astronomical source is limited in angular extent we can obtain the OFF 
spectrum by pointing the telescope away from the source. For simplicity, we further assume 
that the OFF position has no line. Remembering that Tr = T'j^, this gives for on-off 



OFF 



PUif) - P'iflF) 



P'iL 



IF) 



[TAifRF) + {Ta - TA)] 



1 I^RifRp) 



R 



(5) 



This gives the desired quantity TA^fRF) plus the additive constant {Ta — T^), which is 
the difference between the antenna temperatures of the two positions. The result is further 
contaminated by the right-hand multiplicative factor. In effect, this is a frequency-dependent 
gain. However, its effect on the line shape is small because of our assumption of equation HI 
These small effects mean that position switching is usually the technique of choice. 

However, this does not mean that position switching always provides good results. If 
the difference in continuum temperatures {Ta — T^) is large, then its multiplication by the 
right-hand multiplicative factor produces a large effect, and this can make it impossible to 
distinguish the astronomical spectral line TA{fRF)- Thus, position switching can fail for 
weak lines with strong continuum sources. 



2.2. Frequency Switching 

If the spectral line is sufficiently spatially extended then we cannot position switch. 
The prime example is Galactic HI. Here one normally moves off in frequency. This means 
that the ON and OFF RF frequencies differ, i.e. /rf 7^ f'RF- In particular, the RF gains 
differ between the ON and OFF measurements; also, the continuum antenna and receiver 
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temperatures subtract out, i.e. Ta = and Tr = T^. Again, to simplify, we assume the 
ON (unprimed) and OFF (primed) gains do not differ much, i.e. we define 

^^,_%AhEl (6a) 

and assume 

^ « 1 • (6b) 



The differing gains introduce a further comphcation into the Taylor expansion: 



PiflF) - P'iflF) 



IF 



AC 

TAifRF) + {TRifRp) - T'^ifRF)) + -^{Ta + Tr + TAifRF)) 



{Ta+Tr) 



Ta + Tj 



R 



(7) 



This is similar to equation [5] except for the additive term ^{Ta + Tr + TAifRp)) in the 
first factor on the right-hand side. Even though ^ <^ 1, this term is disastrous because it 
operates on Tr, which is large. Unless ^ <^ 1, this combination produces serious baseline 
contamination in frequency switching. Nevertheless, frequency switching works well when, 
as is often the case, ^ varies smoothly and slowly with /rp so that it is well-represented 
by a low-order polynomial fit. 



3. DETERMINATION OF Gip^fif) BY LEAST-SQUARES FREQUENCY 

SWITCHING (LSFS) 

The classical approaches of position and frequency switching work only under good 
conditions. The quantity having the most severe frequency variations is G/f(//f)- If we 
could determine this quantity explicitly we could forgo the switching and, instead, simply 
divide all measured spectra P{fip) by GifHif)- Least-squares frequency switching (LSFS) 
does this explicit determination. 

3.1. The Basic Equations and their Iterative Solution 

We begin by rewriting equation [3] in a much simpler form. Its right-hand side is the 
product of the IF gain and several RF quantities. We lump these RF quantities into a single 
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one, SRFifRp)- 

SRFifRp) = GRFifRp) [{TAifRp) + Ta) + (TrUrp) + Tr)] (8) 

so we rewrite equation [3] as 

PiflF) = GjMp) SRFifRp) . (9) 

Our technique extracts Gipifip) and SRpifRF) as separate entities. 

To proceed, we first express frequencies as cfiannel offsets. Tlie digital spectrometer 
produces a spectrum having I channels, with channel number i ranging from z = to 
i = I — 1. The frequency separation between adjacent channels is A/. Thus, for the IF 
frequency of channel i we can write 

fjp, = fo + zAf, (10) 

where /o is a constant. The separation A/ also applies to fRp, so apart from a possible 
additive constant we have for the RF frequency of channel i 

fRF, = fo + fLO + tAf, (11) 

where fio is the LO frequency. 

In LSFS, we make measurements at different LO frequencies, each designated by n. 
We increment these frequencies in units of A/, and we write 

fLO,n = fLO,n=0 + ^in^f , (12) 

where n ranges from to — 1. Az„ is the number of channels that fLO,n is offset from 
fLO,n=o] clearly, Az„=o = 0. For convenience we assume Az„ to increase monotonically with 
n, so the maximum LO excursion is Az^r.i. We can write all frequencies in units of the 
channel separation A/, so the RF frequencies become expressed as digital indices i + Ain, 
where i is the IF frequency offset from spectral channel zero and Az„ is the LO frequency 
offset from the lowest LO frequency (at n = 0), both in units of the channel width A/. 
Equation [9] becomes 
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PiAin ~ G'jiSj+Ai,! • (13) 

§3.31 presents the simplest "textbook" example, including a figure, to help explain the some- 
what confusing relationships embodied in the above description. 

There are NI of these equations. We could use them as our equations of condition for 
the least-squares fit. However, for reasons discussed below, we normalize the variables for 
computational efficiency. To normalize, we require the means over {i,n), denoted by {PiAi„) 
and {Si+Ai„), to equal unity; of course, this implies that the typical Gj ~ 1. Henceforth we 
assume P and S to be so normalized. 

In addition, we express S'j+Ai„ (whose mean is unity) as an offset Sj+Ai„ from unity, i.e. 
we write 

Si+Ai^ = 1 + Si+Ai„ . (14) 

Clearly, the mean (sj_|_Aj„) = 0. Below we will assume s ^ 1, which should be valid as 
long as the total fractional bandwidth ^^^^^^'j'^^^-^^^f jg ^ot too large and, also, there are 

\JRF / 

no strong spectral lines. This assumption will be made only for reasons of computational 
efficiency and does not affect the final solution. For Pi,Aj„; we can replace the index Ain by 
the index n to reduce clutter. Equation [13] becomes 

Pi,n = Gi + Gi Sj+Ai„ . (15) 

The Pi^n are measured quantities and the Gi and Si+Ai„ are unknowns to be determined 
by a least-squares fit. With / spectral channels, LO frequencies, and a maximum LO 
excursion of Aijv_i frequency channels, the total number of unknowns is a = (2 J + A?jv_i): 
there are / unknown values of G and (/-|- Az7v_i) values of s. For example, for the calibration 
spectrum of Arecibo's GALFA spectrometer (Stanimirovic et al. 2006), we have 512 channels 
so I = 512 and we use Ai^v-i = 31, so there are 1055 unknowns. This is a substantial, but 
hardly impossible, least squares problem. Its solution requires, in essence, the inversion of a 
1055 X 1055 matrix. 

We must solve this set of equations using nonlinear least-squares techniques, which 
are required because both G and s are unknown. Nonlinear least-squares is an iterative 
process, involving making a guess for the parameters and solving for the difference between 
the guesses and the true values. Let the guessed values of the parameters be denoted by the 
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superscript g. Then, taking the difference between the true values and the guessed values, 
and using equation [151 we have 

P^,n - Pin = (G. + .,+A,J - (Gf + Gf sl^J (16) 

We express the differences between quantities with the symbol 6; thus the unknowns become 
SGi = Gi-Gf and 6s i+Ai„ — Si-\-Mn ~ ^i+Ai„- -^^ usual in iterative schemes, we assume these 
differences are small and drop second order terms. Also, we divide through by Gf, an act 
which implicitly assumes that its fractional error is small, but as we shall see in fact it does 
not matter if its fractional error isn't small. This gives 

^ TL ^ ^■-^ i / -I Q \ c / -I 

We have turned the nonlinear least-squares problem into an iterative linear one. 

However, the presence of the term sf_^^^^ means that the equation-of-condition matrix 
changes from one iteration to the next. The number of unknowns is large, and the need to 
evaluate the inverse matrix for each iteration requires significant computational time. We 
can eliminate this burden by dropping the term sf_^_^^^ in the above equation. This yields our 
final set of equations, in which the two unknowns for each channel (i) and each LO setting 
(Ain) are now ^ and (^Sj+Ai„: 

SPi n 6Gi /-. o\ 

= + ^^i+^in (18) 

We use this in an iterative solution in which each step is a linear least-squares fit for the two 
sets of unknowns. The coefficients of the unknowns are all equal to unity, so they remain 
constant from one iteration to the next; in other words, the equation-of-condition matrix in 
the least-squares treatment does not change from one iteration to the next. 

Now, it might seem that the elimination of the term sf^^^^ is an arbitrary action that 
produces erroneous results. However, this is not a problem. As ^ — > — i.e., as we attain 
convergence — this term goes to zero so the final solution is unaffected. And, miraculously, 
it does converge — usually rapidly. 

The set of equations [18] does not sufficiently constrain the solution. One more equation 
is needed to keep the mean RF power {S) approximately constant (i.e., approximately equal 
to unity), which we include as an additional equation of condition: 
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J2Ss,+^^„ = (19) 

i,n 

To solve this set of equations iteratively, we begin with initial guesses Gf = 1 and 
^i+Ain ~ 0- This provides the initial Ff„ = 1. We least-squares solve the NI equations [18] 
and the equation [19] for the unknowns, which are ^ and 5sj+Ai„- The solution provides the 
new guesses Gf and sf_|_Aj„- We usually obtain convergence in ~ 10 iterations, which takes 
a fraction of a second on a contemporary laptop computer for NI = 1055. 

One final comment. After the calculation is finished, the mean of the gains (Gf) = Q 
ends up departing a bit from unity. For many purposes, e.g. when combining independent 
LSFS results, it is desirable to scale the gains so that their mean is unity. To accomplish 
this, simply divide all the derived gains by i.e. we write 

Qg 

Gi^scaied = -pr ■ (20a) 
y 

Similarly, the RF powers are also scaled: 

Si+Ai,,, scaled = ^ (l + sf+Ai„) • (20b) 



3.2. Number of Equations of Condition; Number of Unknowns 

In least-squares fitting we develop equations of condition, one for each observed quan- 
tity. Least-squares fitting requires the unknowns to be overdetermined, i.e. the number of 
unknowns to be smaller than the number of equations of condition. As mentioned above just 
after equation [TS] the number of unknowns is a = {21 + Ain^i). The number of equations 
of condition is M = NI + 1: / channels for each of the N LO frequencies, plus equation [T9l 
For the least-squares technique we require M > a, i.e. 

NI > 21 + AiN-i-l . (21) 

Clearly, it makes no sense to have Az^v-i > /, because that generates additional values of 
Si+Ain while providing no information on Gj. Thus we require Aijy-i = hi, where h < 1. 
This yields 
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(22) 



With h < 1, we generally require > 3. The quantity h is maximum LO separation Az^v-i 
in units of the IF bandwidth; we refer to h as the fractional LO coverage. 



We present an illustrative example with the goal of clarifying the procedure. The sim- 
plest example has the smallest numbers. Since we need > 3 and / > Aiyv-i, we choose 
1 = 4 and N = 3, with Az„ = [0,1,3]. That is, we have a 4-channel spectrometer. We 
use 3 LO frequencies, with the latter two spaced by 1 and 3 channels from the first. This 
provides an arithmetic progression for the successive frequency differences Ainn', which we 
define as {Ain — Ain') (where we consider all combinations of n and n'). The values of 
Ainn' = [0,1,2,3]. Note that max(A2„„/) = Ai^^i = 3. Figure [1] graphically illustrates 
these parameters; this Figure assumes Gip{fip) = 1 everywhere. 

In matrix form, the equations of condition (equations [18] and equation [19]) are 



Here, X has {NI + 1) rows and (2/ + Ai^-i) columns, a is the (2/ + Ai^^i) vector of 
unknowns, and p is the {NI + 1) vector of NI measured and one constrained quantities. 
Our notational convention is that boldface small letters are vectors and boldface large letters 
are matrices. 

For our textbook example, in the vector of unknowns a, to avoid clutter we write gi in 
place of To save space, we write the transpose of this vector, which is 



In the vector of measured quantities p, we write Pi^n in place of This vector's transpose 
is 



3.3. An Illustrative Textbook Example 



X ■ a = p . 



(23) 



[5*0, gi, 92, 93, Ssq, 6si, 6s2, Ss^, Ss^, Ss^, Ssq] 



(24) 



= [P0,0, Pl,0, P2,0, P3,0, P0,1, Pl,l, P2,l, P3,l, Po,2, Pl,2, P2,2, P3,2, O] , (25) 

and the equation-of-condition matrix consists of the coefficients in equations [18] and [191 all 
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P(fiF)forn=0,Ai„=0 




12 3 

IF channel number i 



P(fir) for n=l, Ai„=l 




12 3 

IF channel number i 



P(fii) for n=2, Ai„=3 




12 3 

IF channel number i 



^rfC^rf) 




1 2 3 4 5 6 

Channel Number for Sj„.(fi„.) 



Fig. 1. — Graphical illustration of the illustrative textbook example, assuming all Gipifip) = 
1. The top three panels show the measured IF power Pi.Ai^ [which is captioned "-P(//f)"] 
versus IF channel number i\ the bottom panel shows the RF spectrum (Sj+Ain [which is cap- 
tioned ''SufUrfY^] versus (z + Az„) (which is captioned "Channel Number for SjiF{fRF)")- 
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of which are unity: 



X 



1 






1 
10 





1 



1 
10 




10 





10000100000 
01000010000 
00100001000 
00010000100 
10000001000 




1 





1 



(26) 



001000000 
00010000001 
00001111111 



In this matrix, 

1. The first four rows pertain to the lowest LO frequency with Aiq = 

2. The next two pairs of four rows pertain to Aii = 1 and Ai2 = 3. 

3. The last row is the power conservation equation 

4. The first four columns are the coefficients of the four IF gains ^ in equation [181 

5. The last seven columns are the coefficients of the seven RF powers 5sj+Ai„ in equation 



The usual least-squares process of solving these equations of condition (see Press et al. 1992) 
involves multiplying X by its transpose to obtain the curvature matrix (the matrix of normal 
equations) and then taking the inverse of that matrix product to obtain the covariance matrix 
a: 



a = (X^ ■ X)-i , 



(27) 



and the solution for the coefficient vector is 



a = (a-X'^) -p 



(28) 
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For this illustrative problem, the inverse is well defined with no numerical problems. The 
biggest normalized covariance (or correlation), obtained by converting the covariance matrix 
to a correlation matrix, is —0.51, whose absolute value is not unreasonably large. 

4. AN ILLUSTRATIVE REAL- WORLD EXAMPLE 

Our initial experiments with LSFS were performed with three independent banks of 
Arecibo's interim correlatoi0. Each bank had 2048 channels and covered a bandwidth of 25 
MHz; we overlapped the three banks by cutting off 64 channels on each end, i.e. we spaced 
the centers by |i x 25 MHz, stitching together a single spectrometer with 5888 channels 
covering a total contiguous bandwidth of 71.875 MHz. We binned the channels by a factor 
of 8, making 736 channels of width 0.0977 MHz. We used four different LO frequencies 
with spacings Ai„ = [0,35.50,43.69,72.36], providing the nonuniform set of six spacings 
Aw = [8.19,28.67,35.50,36.86,43.69,72.36]. Figure [2] shows the results. 

LSFS works reasonably well in this initial-experiment example. We performed this 
observation before devising the schemas discussed in ^ and we cannot remember how we 
chose the set of LO frequencies Az„, but we suspect it is not a very good choice because the 
middle third of the RF power spectrum SjipifRF) is slightly displaced from the others. Such 
artifacts do not occur with the better schemas of §71 It is clear that LSFS would perform 
admirably with a better schema, and this particular case illustrates how one can accomplish 
the often desirable goal of reliably generating a contiguous large band from narrower ones. 

We refer to two other real-world examples. In one, we use LSFS to determine G/fI/zf) 
and use that gain spectrum to correct thousands of measured spectra; we provide some 
details in §5.31 In the other, we use LSFS to determine the intrinsic ripples in the RF power 
spectrum; see §11. 2[ 

5. DETAILS OF MATRIX ALGEBRA FOR LARGER /, LARGER 

5.1. Degeneracy in the X Matrix 

Under some conditions we find empirically that some of the equations of condition 
become degenerate. This degeneracy is best understood by considering the X matrix in 



^The Arecibo Observatory is part of the National Astronomy and Ionosphere Center, which is operated 
by Cornell University under a cooperative agreement with the National Science Foundation. 
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Fig. 2. — Real-world example of LSFS discussed in §11 The top panel shows the total 
measured power Pipifir) versus RF frequency for the four LO frequencies; note that the 
HI line remains fixed near 1420 MHz and the IF bandpass shapes move with the LO. The 
second panel shows the measured power PifUif) versus IF frequency for the four LO fre- 
quencies; the IF bandpass shapes remain fixed and the HI line moves with the LO. The third 
panel shows the derived RF power, SnpifRF) in equation [9], with the known spectral peaks 
identified. The bottom panel shows the IF gain Gjp{fjp). 
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equation [23] as a series of a column vectors, where a is the number of unknowns; a = 21 + 
Az^v-i- Suppose that two columns are degenerate, meaning that they are linear combinations 
of each other. In the matrix product X ■ a, there is a one-to-one relationship between each 
column in X and its corresponding unknown coefficient in a. A degeneracy between two 
columns means that the matrix product cannot distinguish between the two corresponding 
coefficients. 

When one applies the usual technique of generating the normal equations and inverting 
the curvature matrix, as in equation [271 it does not work: the inverse matrix does not exist 
because of the degeneracy. In cases like this one has two choices: think hard and find the 
root cause of the degeneracies; or use Singular Value Decomposition (SVD) to empirically 
remove them. The number of unknowns is large, so the first option — picking through a 
huge matrix looking for degeneracies — is difficult, probably even for a mathematical expert. 
Therefore, we choose the latter one. 

5.2. SVD: Theory 

Numerical Recipes (Press 1992) provides a useful discussion of the SVD technique as 
applied to least squares. The SVD technique forgoes the usual generation of the normal 
equations for calculating the matrix (a ■ X"^) in equation [2H1 Rather, it expresses this 
matrix in terms of three matrices that are derived from the SVD decomposition of the X 
matrix. 

The cornerstone of SVD is that any M xa matrix, where the number of rows M and the 
number of columns a satisfy M > a, can be decomposed as the product of three matrices. 
In particular, our matrix X in equation [23] satisfies this criterion, so we can write 

X = U ■ [W] • , (29) 

where the right-hand side contains the three SVD matrices. These matrices have important 
properties: 

1. U is M X a, [W] is a X a and diagonal, and V"^ is a x a. The square brackets around 
the matrix W indicate that it is diagonal. 

2. The columns of U consist of unit vectors that are orthonormal, and the same is true 
for V. Because V is square, its rows are also orthonormal so that V ■ V'^ = 1. Recall 
that, for square orthonormal vectors, the transpose equals the inverse so V"^ = V"-"-. 
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Degeneracies are directly reflected in the V and [W] matrices. The square matrix V 
consists of a set of a orthonormal column vectors. These are normalized orthogonal vectors 
that are linear combinations of nonorthogonal column vectors in X. Suppose that L columns 
of X are degenerate. Then the number of independent orthonormal vectors represented in X 
is decremented by L. Nevertheless, the V matrix still contains a independent orthonormal 
vectors. The decrement by L is represented not by the orthonormal column vectors in V, 
but rather by their weights in the diagonal matrix [W]. Each column vector in V has an 
associated weight in W, and if there is degeneracy, then the corresponding value of W is 
zero. This means the corresponding orthonormal vector in V cannot be represented by the 
column vectors in X. 

Having derived the SVD components of X, we can write for the matrix product (ck-X^) 
in equation [281 



Now suppose there is degeneracy; then the corresponding values of [W] are zero, so the 
corresponding values of \_-^'\ become infinite. This is an attempt by the matrix algebra to 
represent the space defined by the corresponding columns of V with data that were taken 
with inappropriate values of X. 

SVD, as applied to least squares, handles these infinities by setting the corresponding 
values of [^] (which are formally equal to oo) to zero. This provides stable, realistic solutions 
in which the offending degenerate coefficient values are close to being correct — or, at least, 
not being totally unreasonable. By following this procedure, one can handle degeneracies 
without understanding their cause simply by zeroing out the relevant components of the 
[w] vector. 

This zeroing process can — and should — be applied in cases of near degeneracy. Just 
exactly what "near" means depends on the noise in the data, because the data values are 
amplified by \_-^'\ in calculating the coefficient values. For sufficiently noisy data one might 
best zero out the offending elements of [^^j even if they are not too terribly large. 

The X matrix depends only on the values Ai„, not on the data values S. Moreover, 
calculating the inverse of the X matrix is computationally expensive. Thus, when using 
a particular set of LO frequency offsets for multiple observations, it behooves one to do 
the SVD calculation of (a ■ X'^) once, store the results on disk, and read them back when 
necessary. This has the further advantage that one can examine the weight vector [W] once 
for each set of LO frequency offsets, decide which particular values of to set to zero, and 




(30) 
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forget about dealing with this on a case-by-case basis. Section 15.31 provides some comments 
on examples we encountered. 

5.3. SVD: Practice 

The ratio of maximum to minimum weights determines whether some inverse weights 
should be zeroed. For noise-free data, ratios that come close to the machine accuracy (per- 
haps 10^ for single-precision math) should be zeroed; in the presence of noise, smaller ratios 
should be zeroed. Our initial experiment of §l]with four LO frequencies had no degeneracy. 
For our numerical experiments, we examined various schemas, which are detailed in §7.11 
and §7.21 Most schemas had weight ratios smaller than 2500. The highest ratios occurred 
for N=3 and decreased rapidly (e.g., by a factor of 2) for successive increases in N. In some 
cases we zeroed inverse weights to keep the ratio smaller than a few hundred, and in some 
cases not. 

There were two exceptions, which had much larger ratios. The MRN,R schemas had 
large ratios, again ~ 10^, for i? 7^ 1. For R = 2, 4, and 8 we had to zero 1, 3, and 7 inverse 
weights, respectively. (Recall that this is out of a total of over 1000 weights). The 2^^^ 
schema had ratios ~ 10'' for a few elements; this schema is no good anyway and we do not 
quote results for it here. 

Our work with the GALFA spectrometer (Stanimirovic et al. 2006) is an interesting case. 
This spectrometer observes two spectra simultaneously, the wide "calibration" spectrum and 
the much narrower "science" spectrum. The calibration spectrum is 100 MHz wide with 512 
channels, for which we use the MR7 arrangement (see §7]), exactly like our MR7 numerical 
experiment below. 

We use the same set of 7 LO frequencies for the science spectrum. This spectrum covers 
a bandwidth of ^ ~ 7.143 MHz and has 8192 channels, which makes A/ ^ 872 Hz. Thus, 
the LO increment A/ is about 224 times the channel spacing. Of the 8192 channels, 7679 are 
recorded. We invent an extra, making the total 7680. Before applying LSFS we rebin these, 
lumping successive bins of 16 together so that the total number of binned channels is 480. 
The frequency offset of the seventh LO frequency is almost as large as the total bandwidth, 
so we use only the first six LO frequencies. This gives increments Ai„„' ranging from 14 
to 283, so ^ = |§ « 0.59. We assume smoothness and use an interpolation procedure to 
recover the 7679 values of Gjp^fip). 

This narrowband case exhibits thirteen degeneracies. We have 1198 coefficients to derive, 
so we have 1198 orthonormal column vectors in the V matrix. The 1198 weights [W] range 



- 19 - 



from about 8.9 x 10"^ to 27. Thirteen weights are nearly degenerate, being smaller than 
1.8 X 10^"^; we set their corresponding inverses to zero. The lowest nonzeroed weight is 
~ 0.69, so there is a huge gap between the range of accepted weights (~ 0.69 to 27) and the 
zeroed inverse weights (the largest is ~ 1.8 x 10~^). Zeroing the 13 inverse weights provides 
a very nice solution for the IF gain, which is used to correct thousands of measured mapping 
spectra. 

6. REGARDING COMPUTING TIME 

Regarding computing time, we scale from the MR7 scheme described in §3, which has 
I = 512, N = 7, and Az^v-i = 31. The LSFS X matrix inversionfor I = 512 and Ai^v-i = 31 
takes 123 seconds on a not-quite contemporary laptop computer programmed in IDL. This 
time scales as the number of unknowns cubed, i.e. as (2/ + AzAr_i)^, so for / = 4096 it would 
take about 20 hours. This is a long time, but as discussed in §3] and §3 the matrix inversion 
should be done once and the result stored on disk. The X matrix is sparse, and perhaps 
sparse matrix techniques would make its inversion go faster. 

Once the matrix is inverted the solutions go fast. For the MR7 scheme, the solution 
time is about 0.6 seconds. This time scales as NI{2I + Aijv-i), so if is kept constant it 
scales roughly as thus, 4096 channels would take about 5 seconds. 

7. SCHEMAS FOR LO SETTINGS 

In §3.21 we found that the number of different LO frequencies, A^, must exceed 3. How- 
ever, this tells us nothing about how the quality of the solution is affected by A^, and even 
less about how the LO frequencies should be chosen, i.e. the values of Az„. It is not clear 
to this author how to investigate these matters analytically. Rather, we turn to numerical 
experiments. Specifically, we consider what we hope are intelligent schemas for A«„, adopt 
them for a range of A^, and analyze the results of the numerical experiments. We begin by 
discussing various schemas. First we describe a conservative approach, which uniformly sam- 
ples /^inn' up to a maximum but requires a fairly large number of A^, and then we describe 
some less conservative approaches. 
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7.1. The Minimum-Redundancy (MR) Schema 

First wc discuss Minimum-Redundancy (MR) settings having N LO settings. We will 
denote such settings by the symbol MRN, where N is the number of LO settings. At the 
most conservative and basic level, common sense suggests the following criteria: 

1. An arithmetical progression for the successive frequency differences Ai„„/, which we 
also call "spacings". Spacings should begin with unity and increment by unity up to 
some maximum value N^^x- K seems to us, intuitively, that incrementing by unity 
is akin to sampling uniformly when doing Fourier transforms, which is always the 
desirable situation. 

2. A reasonably large value for the fractional LO coverage /i, which is the maximum 
LO offset Azjv-i in units of the spectrometer bandwidth. It seems to us, intuitively, 
that precisely recovering broad-scale frequency structure requires sampling those broad 
scales with a comparably large value of Ai^v-i. 

The problem of generating an arithmetic progression with settings of the LO frequency 
is akin, in radioastronomical interferometry, to the well-studied problem of generating an 
arithmetic progression of unidirectional baselines with a linear array of telescopes. For N LO 
frequencies there are frequency spacings, some of which are redundant; similarly, for 

N telescopes on the ground there are ^^^~^'> distance spacings, some of which are redundant. 
The classic discussion by Moffett (1968) considers these minimum-redundancy telescope 
arrays, which use N antennas to generate a minimally redundant arithmetical progressive 
series for Ai„„/. Zero redundancy is possible only for A^ < 4; for A^ = 4 the spacings range 
up to Ai;v-i = 6. For N > A there must be some redundancy. Moffett presents two types of 
minimum-redundancy arrays, restricted and general. 

Restricted minimum-redundancy arrays provide all spacings Ai„„/ up to a maximum 

Nmax, with no gaps; these are useful for radio interferometry when the available real estate 
is limited. General ones provide all spacings up to a particular limit, and in addition provide 
larger spacings. For example, the N = 7 restricted array provides all spacings Ainn' ^17, 
while the N = 7 general array provides all spacings Ainn' ^ 18 and, in addition, Ainn' = 
[24, 26, 31]. For our purposes the general array is a better choice: apart from the fact that we 
are not limited by available real estate, it provides more different values for Ainn' and, also, a 
larger value of h. For the general minimum-redundancy arrays, roughly Aijv-i ~ 31 (y)^'^^. 
This is a steep dependence, so it is possible to generate a large number of LO frequencies to 
get a desired h without making N ridiculously large. 
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The first nine rows of Table [T] presents information about tlie range = 3-11. Column 
1 contains our names for the setting arrangement, running from MRS to MRU; columns 2-4 
provide the three quality indicators from the numerical experiments, which we define below 
in §H1 columns 5-6 provide the maximum LO separation with continuous coverage Nmax and 
the maximum LO separation Ai^-i. The last column, headed "LO spacings", is from the 
general arrays in Moffett's (1968) Table 1. It has dots and numbers: the dots represent the 
frequency settings of the LO and the numbers the spacing between the frequencies in units of 
the minimum spacing. Ishiguro (1980) discusses a subset of algorithms that generate arrays 
having larger A^. 



7.2. Other Schemas 

From the practical standpoint, an observer wants to keep as small as possible. MRN 
settings obtain uniform coverage in Ai„„/ but require fairly large to attain large maximum 
separations Az^v-i- Here we propose schemas that sacrifice the uniformly-sampled Ai„„/ in 
favor of increasing the maximum separation Aiiv-i- 

We consider five such schemas, four of which are specified in terms of the MRN spacings 
AzMRAf- The five schemas are: 

1. The LO separations are the square of the MRN set, equal to {AimrnY- We desig- 
nate this by MRN^. This works quite well and we recommend it in our comparative 
discussion in §8.31 

2. The LO separations are the 1.7-power of the MRN set, equal to {AimrnY''' ■ We 
designate this by MRN"'^'''. This works almost as well as the MRN^ schema. 

3. The LO separations are equal to —3", where n varies from to A^ — 1. For 1 < n < 
(A^ — 1), the n*^ LO frequency is given by A2„ = Az„_i + (—3)""^. We designate this 
by 3^^. This works comparably to the MRN-'^ '^ schema. 

4. The LO values (not the separations) are a power-law series. The offset of the 'n}^ LO 
from the 0*'^ is equal to ra^ '', where n varies from to A^ — 1. We designate this by 
N^'^. This works less well than the above schemas. 

5. The LO separations are equal to 2'^*"^'^. We designate this schema by 2^'^^^ . This 
schema provides such poor results that we do not include it in Table [H 

The exponent 1.7 in schemas MRN^'^ and N^ '' is inspired by the choice of spacings for the 
Very-Large- Array antennas. We have not experimented with different values of the exponent. 
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Table 1. Schemas: Definitions and Results 
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Note. — Schemas are defined in the text, §7.11 and §7.21 Columns 2-4 contain our three quality 
indicators: the mean RF power offset and the RMS error for IF gains, defined in §8.H and the 
lowest-frequency Fourier amplitude, defined in §8.2[ Spacings are completely covered up to N^ax, 
which is given in column 5. Column 6 contains Aztv-i, which is the maximum LO spacing and is 
equal to the sum of the — 1 spacings. 
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8. NUMERICAL EXPERIMENTS 

We evaluated the above schemas in numerical experiments. We first invented a noise- 
free IF gain and RF power spectrum and, for each schema, ran 256 trials. In each trial we 
added 2 K Gaussian-distributed noise for each LO setting. 
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Fig. 3. — The noise-free input spectra for our numerical experiments. The top two panels 
display the IF gain spectrum versus IF frequency and its Fourier amplitude spectrum; the 
bottom two the RF power spectrum versus RF frequency and its Fourier amplitude spectrum. 



Figure [3] shows the noise-free input spectrum and its Fourier amplitude spectrum for 
the IF gain (top two panels) and the RF power (bottom two panels). The IF gain is the 
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product of three terms, which are meant to simulate three different hardware characteristics: 

1. The overall shape of the band-limiting filter at the spectrometer input, 

GiFjiiter = 0.5(tanh[5(/,,. + 1)] - tanh[5(/,^ - 1)]) . (31a) 

2. A standing wave with period 0.5 MHz in the cable connecting the feed to the spec- 
trometer 

GlF,wave = 1 + 0.1 COS ^27r^^ . (31b) 

3. A slowly- varying polynomial-dependent gain from electronics and amplifiers 

1 + 0.1/,^ + 0.5/2^ . (31c) 

In all of the above, the IF frequency fip runs from -1.5 to +1.5 MHz, which range is 
covered by / = 512 channels. The top panel of Figure [3] shows the product of these terms. 
The second panel shows the Fourier amplitude spectrum, with the horizontal axis being 
the Fourier component wavelength in units of the 3-MHz IF bandpass. Thus, the standing 
wave term above in equation I31bl is clear in this plot: its period is 0.5 MHz, which is | the 
3-MHz IF bandwidth so it appears at 0.17 on the horizontal axis. We could have labeled the 
horizontal axis "Fourier Wavelength, MHz" and made the maximum ^ instead of 1. 

Similarly, the RF power is the sum of three terms: 

1. A frequency- independent system temperature of 30 K; 

2. Five rectangular spectral lines of widths 2, 4, 8, 16, and 32 channels having amplitudes 
7, 6, 5, 4, 3 K and spaced so that they are nonoverlapping (we used rectangular lines 
to facilitate seeing degradation in frequency resolution); 

3. Channel-to-channel uniformly-distributed random noise with amplitude limits to 5 K. 
This simulates a rich, crowded mm-wave spectrum containing a plethora of molecular 
lines. 

The sum of all these components provides a channel-average system temperature of slightly 
more than 32.5 K. The third panel of Figure [3] shows the RF power over 6 MHz bandwidth. 
The fourth panel shows the Fourier amplitude spectrum, with the horizontal axis being the 
Fourier wavelength in units of MHz. That is, sinusoidal ripples across the spectrum in panel 
3 have a maximum period of 1 cycle over the 6 MHz band. 
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8.1. Results: Two Quality Indicators 
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Fig. 4. — Plots of quality indicators A{RF) and (t{IF) versus the number of LO settings A^. 
The different plot symbols specify the LO spacing schemas discussed in §7.11 and §7.2[ The 
top panel displays the mean error of the RF power and the bottom the RMS uncertainty of 
the IF gain. 

Figure H] plots quality indicators for the above schemas as a function of the number of 
LO settings A^. Also, Table [1] lists these quantities in columns 2 and 3. We first consider 
two indicators: 

1. The top panel displays A{RF), the mean error of the RF power, where the mean is over 
two quantities, the channels in the RF spectrum and the 256 trials. A{RF) represents 
an offset bias in the derived RF powers. The units are Kelvins and the mean system 
temperature is 32.5 K, so an error of 0.1 K is a fractional error of 0.3%. 

2. The bottom panel displays a (IF), the RMS uncertainty of the IF gain spectrum, in 
units of the theoretical RMS. To calculate this, we first form the difference between (a) 
the mean IF gain spectrum averaged over the 256 trials and (b) the input, noise-free 
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IF gain spectrum. We consider only the central 200 channels and fit a second-order 
polynomial to eliminate broad baseline wiggles, which are more serious for smaller 
values of the fractional LO coverage h. With our 256 trials the number of independent 
measurements for each channel is 256A^ and, with our 2 K input noise and 32.5 K 
system temperature, the ideal theoretical RMS is 



2/32.5 
V256Af ■ 



For completeness, we could also present the RMS uncertainty of the RF power. However, 
those results are comparable to those of the IF gain, so we refrain from presenting these to 
save space. 

Both A{RF) and <j{IF) decrease with A^, and some schemas are better than others. In 
particular, the MRN, MRN2,MRN^ ^ and 3^^ schemas are all quite good. 



8.2. Results: Relative Fourier Amplitudes, a Third Quality Indicator 

The above two quality indicators do not tell the whole story, for two reasons. One, they 
are derived after baseline subtraction, which removes large-scale ripples in the frequency 
spectra; we expect such ripples to be larger when h, and thus A^, is small. Two, schemas 
other than the minimum-redundancy one do not uniformly sample Fourier components, and 
so their Fourier amplitude spectra should be less uniform. Here the focus is on the relative 
Fourier amplitudes, so we define all amplitude spectra F-Ampl to be the actual amplitude 
divided by the minimum amplitude Fourier component for that spectrum. 

Figure [5] displays Fourier amphtudes of the IF gain residual spectra for a selection of our 
schemas. The left-hand panel emphasizes the higher frequencies. The bottom three plots 
are for MR5,1, MR7,1, and MR11,1. With the uniform sampling in Ai„„/, these provide 
quite uniform Fourier amplitudes at high frequencies; and as increases from 5 to 11, the 
fractional LO coverage h increases, which decreases the amplitude of lower-frequency Fourier 
components. For MR7,2 and MR7,4, we have strong Fourier components at frequencies 256 
and 128 cycles, respectively, over the 512-channel IF band. These Fourier components are 
easily understood, because they correspond to periods of 2 and 4 channels, respectively. The 
other schemas do not have uniform sampling in Ainn', and this is reflected in their Fourier 
amplitudes, which are not very uniform at high frequencies. 

The right-hand panel emphasizes the low frequencies. Generally, the low-frequency 
amplitudes are smaller for large fractional LO coverage h. This is reflected in the MR7,1, 
MR7,2, and MR7,4 spectra, as well as the MR7,1 and MR11,1 spectra. Also the other 
schemas emphasize larger h at the expense of uniform coverage in Ainn' , and this is reflected 
in their smaller low-frequency amplitudes. 
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Fig. 5. — Plots of Fourier amplitude spectra of the IF gain residual spectra. The left panel 
exhibits the full frequency range from 1 cycle to 256 cycles per 512-channel IF band, with 
successive spectra displaced vertically by 1 unit. The right panel displays the low-frequency 
components by exhibiting the first 10 components (excluding the zero frequency component), 
with successive spectra displaced by 10.625 units. 

The first Fourier coefficient serves as a proxy for the low-frequency Fourier components. 
Figure El plots the first Fourier amplitude F-Ampl[l] versus the fractional LO coverage h. 
As we anticipated, F-Ampl[l] decreases with h; the dashed line is a minimum-absolute- 
residual-sum fit to the points, which fit de-emphasizes large deviations from the fit. The fit 
is 
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Fig. 6. — The amplitude of the first Fourier coefficient versus h, the fractional LO coverage. 
The dashed line is minimum-absolute-residual-sum fit to the points. 



F-Ampl[l] ^ . (32) 

This parameter, F-Ampl[l], is our third quality indicator, and we list it in column 4 of Table 
[1] For all reasonable schemas it is closely approximated by equation | 



8.3. Summary: Which LSFS Schemas Are Best? 

Which LSFS schema is best? If one wants the best accuracy and large is not a 
problem, then the Minimum Redundancy (MRN) schema is ideal because it provides a fiat, 
featureless spectrum of high-frequency Fourier amplitudes in Figure O If one is willing to 
sacrifice accuracy in favor of a smaller number of LO settings A^, then one can consider the 
MRN"^, MRN^''^ , and schemas. These provide comparable results for = 5 and = 6, 
with MRN"^ having the edge. As a compromise between practicality and good results, our 
numerical experiments suggest the MRN"^ schema, with A^ > 4. 



9. SCHEMAS THAT HAVE min(Aw) > 1 

It is not strictly necessary for min(Ai„„/) to equal unity; rather, it can equal some 
integer multiple, which we call R. Increasing R provides increased fractional LO coverage 
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h, which may be desirable. We envision two circumstances where R > 1 might be desirable. 
One is when spectral resolution can be degraded and one can bin the data into groups of R 
channels ( §9.11) . The other is when the number of spectral channels / is large: matrix sizes 
scale with / and the computational load for inverting a matrix scales with the cube of the 
matrix size, so using LSFS with large / and R = 1 can be a problem. We can reduce this 
problem by using R^'^ sampling ( §9.2p . 

9.1. Binning 

If one wants to derive the IF gain spectrum and has prior knowledge that it varies slowly 
with frequency, then one can increase the fractional LO coverage h by increasing R and, also, 
bin the data into R channels and sacrifice resolution. Mathematically, this combination is 
identical to a dataset with R = 1. 



9.2. R^^ Sampling 

This technique retains full spectral resolution while keeping the matrix sizes manageable 
by using the following subterfuge. Suppose I = 4096. Convert the 4096-long spectrum into 
a series of R subspectra, each of length by choosing every i?*'^ point. For example, 
for R = 8, one creates 8 subspectra, each 512 channels long, each spanning nearly the full 
frequency range covered by the original 4096 channels. One uses LSFS on the R subspectra 
independently and patches the R result spectra together. For each subspectrum, the smallest 
LO spacing is equal to the bin separation, so each solution is mathematically identical to 
that for 512 channels and R = 1. 

We did our numerical experiment for two schemes, the MR5,8-x scheme and the MR5^,8- 
X scheme; here the suffix "-x" signifies this i?*^ sampling scheme. Table [T] lists the quality 
indicators for these two experiments; they are comparable to the R = 1 versions, which is 
reasonable because the mathematical equivalence of the R solutions. 

The method is not perfect. Figure [7] shows the Fourier amplitude spectrum of the 
4096-channel IF gain spectrum. The bottom panels show the full range of frequencies, 1 
to 2048 cycles over the 4096-channel IF band. The most striking thing about these spectra 
are the spikes, which lie at multiplies of 512 cycles per IF band. This is easily understood, 
because the band contains 4096 channels; 512 cycles per IF band corresponds to a period 
of 8 channels, which is the value of R. This shows that each of the R subspectra is slightly 
offset in power from the others, which is a result of their being reduced independently. Next, 
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Fig. 7. — Numerical experimental Fourier amplitude spectra of the IF gain spectra for the 
two schemes MR5,8-x (left panels) and MR5^,8-x (right panels). The bottom panels show 
the full frequency range and cut off the peaks; the top panels show a magnified frequency 
range centered at 512 cycles per IF band. Note the difference in vertical scale for the top 
panels! The peaks repeat almost identically at multiples of 512. 
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spikes repeat, almost identically, at multiples of 512. This repetition can be understood in 
terms of the Fourier convolution theorem: the original data are, in effect, multiplied by a 
function that repeats every R channels; in the transform domain, the spectrum is convolved 
by the Fourier transform of this function, which produces the repetition. 

The left-hand panels of Figurer [7] show MRN,R-x and the right MRN^R-x. The top 
panels show expanded views around 512 cycles per IF band. Note the difference in scale 
for the top panels: the MRN,R-x schema has much bigger Fourier peaks at the 512-cycle 
multiples and much smaller and more uniform Fourier amplitudes in between. In contrast, 
the MRN^,R-x has about four times smaller peaks, but stronger and less uniform Fourier 
amplitudes in between. 

One could reduce these Fourier spikes and gain accuracy in the derived results by as- 
sociating an additive constant for each of the R result spectra and devising a minimization 
procedure to determine the values of the R constants. More appealing is using the con- 
ventional, well-known technique of Wiener filtering to eliminate the spikes. Wiener filtering 
would work especially well for the MRN,R-x result because the Fourier power is so concen- 
trated. 



9.3. No Binning: the MR7,R Schemas 

Suppose that one observes with R 1 but does neither of the above tricks — neither 
binning nor R^'^ sampling. How do the solutions fare under these conditions? We performed 
numerical experiments for R = [2,4,8, 16] only for the MR7 scheme, and we denote these 
LO setting arrangements by the name MR7,R. We present the three quality indicators in 
Table [1] and show the derived IF gain spectra in Figure [HI As expected, larger R leads to 
smaller F-Ampl[l], but the other two quality indicators are degraded. R > 8 breaks down 
and is worthless. 

Figure [H] exhibits IF gain difference spectra for R = [1,2,4,8], plotted against the 
background of the IF gain itself. The results for R = 1 and 2 look reasonable. The result for 
R = 4 looks reasonable except for the systematic periodic signal in the difference spectrum. 
This has a period of exactly 4 channels, which is the minimum LO spacing, and confirms 
our hunch that one cannot accurately reconstruct the input signals with Fourier components 
smaller than the minimum LO spacing. Of course. The Fourier amplitude spectra in Figure 
|5]show this behavior quantitatively, and shows also that for R = 2 there is also a systematic 
periodicity of 2 channels. For R = 8 the solution breaks down and becomes worthless; 
moreover, for 16 of the 256 trials the iterative solution discussed in §3.11 didn't converge. For 
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Fig. 8. — The noisy lines are the IF-gain difference spectra for R = [1, 2, 4, 8] versus fjp for 
the MR7,R schema discussed in §9.3[ scaled by the factors shown. The solid smooth line is 
the IF gain Gjp itself. 

R = 16 the solutions never converged. 

These results show that, with R > 1, one should use either the binning or the i?*'^ 
sampling trick. 



10. POLARIZED STOKES PARAMETERS: SWITCHING AND LSFS 

Our above discussion applies to a single polarization, or to Stokes /. Here we generalize 
to the three polarized Stokes parameters (Q, U, V). For discussion purposes, we discuss the 
example of native orthogonal linear polarizations, which produce time-variable voltages X 
and Y. We form Stokes parameters by taking time averages of the four possible products, 
whose results we denote by, for example, XX. The four Stokes parameters include the sum 
{XX + YY), the difference {XX - YY), the product 2XY, and the product with 90° phase 
lag 2YX; for native linear polarizations, these combinations produce Stokes /, Q, U, and V, 
respectively. (For native circular polarizations, these combinations would produce Stokes /, 
V, U, and Q, respectively.) Part of the process of calibrating the Stokes parameters involves 
applying the Mueller matrix calibrations to these combinations as discussed by Heiles et al. 
(2001), and we assume these corrections have applied to these time-average products before 
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applying the discussion below, which covers how to obtain Stokes spectra from switching or 
the LSFS technique. 

10.1. Position and Frequency Switching 

Generating I and Q requires taking the sums and differences. For position and frequency 
switching, this is simply a matter of arithmetically combining the results in equations [5] and 
O respectively. 

Generating U and V requires crosscorrelating the polarizations. Consider U , which we 
obtain from 2XY . We write the counterpart of equation [3] as 

Puifip) = [GxXJFiflF) GxX,RF{fRF)]^^^ [GyYJFiflF) GYY,RF{fRF)]^^^ (33) 

[{UaUrf) + Ua) + {UrUrf) + Ur)] . 

Here the square-root of the gain products G appear because U is derived by multiplying 
voltages X and F, while the gains Gxx and Gyy are power gains. We could write the 
equivalent of equations [5] and [71 but these would be algebraicly cumbersome and would not 
convey much information. Usually, astronomical polarized signals are weak, so if we keep 
only zeroth order terms then the position- and frequency- switched spectra are simplified 
equivalents of equations [5] and [TJ and they both become 

<p"f/\~p,Jf'"L " W.f) + (U, - CQ] ^ (34) 

[PxxUIFjPyyUlFjr^ 

10.2. LSFS 

LSFS also applies to polarized Stokes parameters obtained from crosscorrelation. The 
application of the LSFS procedure to XX and YY individually provides their associated RF 
powers and, more importantly for now, the IF gains GxxjFifiF) and Gyy^fHif)- Thus, 
in equation we can treat them as known quantities and move them to the left-hand side, 
which is the same as pre-correcting the data for the IF gains. 

With this, it is straightforward to duplicate the steps leading to equation [131 We begin 
with the analog of equation [HI 



SuUrf) = [GxxMIrf) GyyMJrf)]'^' [{UAifRF) + Ua) + (URifRp) + Ur)] , (35) 
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and carrying through the algebra we arrive at the equivalent of equation [131 




(36) 



where to reduce clutter we write G 



IF, i — 



[GxX,IF, i Gyy,if, i] 



1/2 



The left-hand side contains the measured quantities and the right the desired unknown 
ones. In contrast to equation [13l there is only a single unknown on the right-hand side. This 
means that the least-squares solution for the unknowns is just the appropriate average of the 
left-hand quantities. Alternatively, we can follow the line of development pursued in §3] and 
express the solution as a least-squares problem using matrices. This is more time-consuming 
computationally, but much simpler programmatically and offers more flexibility. In contrast 
to the the situation in ^ this least-squares fit is a linear one so it is not necessary to do an 
iterative solution. Neither do we need to add the additional constraint embodied in equation 



Referring to §3.21 here the number of measurements is M = NI (/ channels for each 
of the LO frequencies) and because we do not have to determine the gains {5Gi) the 
number of unknowns is only a = {I + Ai^^i). To make the number of measurements exceed 
the number of unknowns, we require N > {1 + h), which is more easily satisfied than the 
corresponding equation [22l The current X matrix has NI rows and (/ -|- Aix_i) columns. 
For our textbook example of §3.31 the correspondent to equation [26] looks like 



m 



X 



1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 



(37) 



In this matrix, the arrangement of rows and columns is similar to that in equation [261 except 
that the IF gains do not appear. Here in equation [371 we have 
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1. The first four rows pertain to the lowest LO frequency with Azq = 

2. The next two pairs of four rows pertain to Aii = 1 and Ai2 = 3. 

3. The last seven columns are the coefficients of the seven RF powers (5sj+Ai„ in equation 

m 

11. LSFS versus Switching 

The traditional technique for heterodyne spectroscopy is called "switching" . It involves 
taking an off-source reference spectrum. This technique has been used by radioastronomers 
for decades. We offer a few comments that compare the two techniques. These comments 
are based mainly on our observing experience over the years and the numerical simulations 
discussed above. Unfortunately, we have not experimentally investigated these matters with 
LSFS spectra because our observing experience with this technique is limited to a handful 
of projects. 

11.1. Channel-to-channel noise 

First, consider a{IF), the channel-to-channel noise for the IF gain. Figure H] displays 
(j{IF) for various LSFS schemas. These dispersions are normalized to the ideal, for which the 
noise is determined by the time-bandwidth product — where the time is the full integration 
time used for all LO settings. A value of unity would result if the LSFS fit provided no 
degradation in noise. All the values shown on Figure H] are less than 2. 

It is important to recall the noise in a conventional position- or frequency-switched 
spectrum. Conventionally, for a switched spectrum half the total observing time is spent 
on the OFF spectrum; this reduces the sensitivity of the ON spectrum by a/2. Subtracting 
the equally noisy OFF spectrum reduces the sensitivity by another factor y^- Thus, a 
conventional switched spectrum has cr{IF) = 2. All of the points displayed on Figure H] have 
better sensitivity than a conventional switched spectrum! 

11.2. Baseline wiggles 

Next, consider the slow undulations of the RF power spectrum with frequency, com- 
monly referred to as "baseline wiggles". For conventional switched spectra, our discussion 
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of ^ shows that the basehne wiggles are determined by the frequency dependence of the RF 
power, which is in turn determined by reflections, the RF gain, and system temperature. 
These are instrumental properties associated with the telescope structure, RF amplifiers, 
and associated circuitry; they change slowly with time. Thus they do not tend to decrease 
with increased integration time. 

For LSFS spectra, there are two sources of baseline wiggles. One is these mtrmszc wiggles 
that reside in the RF power spectrum. These often occur from reflections, as explained in 
^ LSFS will not eliminate these. On the contrary, it is very useful for determining them, 
just as we did in our detailed discussion of reflection-induced baseline wiggles at Arecibc^. 
We emphasize that these intrinsic wiggles are not artifacts introduced by LSFS because they 
are actually present in the RF power spectrum that enters the feed. LSFS will determine 
these intrinsic wiggles but, in contrast to frequency-switched spectra, it will not exacerbate 
them. 

The other type of baseline wiggle in LSFS spectra is associated with the fitting of low- 
frequency Fourier amplitudes shown in the right-hand panel of Figure O The nonflat Fourier 
amplitudes reveal that some Fourier components are reproduced less accurately than others. 
In particular, the amplitudes increase for larger wavelengths, i.e., for slower variations with 
frequency — leading to basehne wiggles. 

This fitting type of LSFS wiggle should decrease with increased integration time because 
there should be no systematic bias in the phase of the fitted LSFS Fourier components. 
Rather, the amplitudes are less well determined, leading to increased noise, but the location 
of a positive- going fitted baseline ripple should change from one integration to the next. 



12. SUMMARY 

We described a new technique for obtaining accurate results from heterodyne spec- 
troscopy. It involves taking measurements at 3 or more local oscillator (LO) frequencies and 
using least squares to derive the RF power and IF gain spectra as individual entities. We call 
this the Least-Squares Frequency-Switching (LSFS) technique. We have used the technique 
in two ways: one, to obtain a single IF gain spectrum used to correct a series of several thou- 
sand mapping measurements; and two, to obtain the RF power spectrum directly during a 
long integration. 



^See Heiles, Carl 2005, "Some Characteristics of ALFA's Fixed Pattern Noise (FPN)", Arecibo Technical 
memo 2005-04, available at ,http : //www.naic . edu/ science/techmemos_set .htm, 
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We discussed mathematical details and computational requirements of the technique 
and explored optimum observing schemas using numerical experiments. The quality of the 
results depends on the choice of the LO frequencies, and §8.31 summarizes our results and 
recommendat ions . 

We illustrate the method with three real- life examples. In one of practical interest (§!]), 
we created a large contiguous bandwidth aligning three smaller bandwidths end-to-end; radio 
astronomers are often faced with the need for a larger contiguous bandwidth than is provided 
with the available correlator. In §21 we outlined an approach suitable for computationally 
difficult cases as the number of spectral channels grows beyond several thousand. 

This work was supported in part by NSF grant AST-0406987 and, also, by the NAIC. 
Josh Goldston Peek and Snezana Stanimirovic read an early version of the manuscript and 
made several important suggestions, and Benjamin Winkel made several valuable comments 
on a later draft. 
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